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[57] ABSTRACT 

A document is read as an image file and stored. At this time, 
a table is updated. On the table, each of words included in 
text information obtained by character- recognizing image 
information output from the read means is associated with a 
page number of a page including each word and a storage 
position of the image information. With this table, not only 
the image information is provided by a full-text search, but 
also the page number of the page including the search 
keyword can be provided. The provision of the page number 
has a potential to facilitate search operations. 
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DOCUMENT INFORMATION 
MANAGEMENT SYSTEM, METHOD AND 
MEMORY 

BACKGROUND OK THE INVENTION 

The present invention relates to a document information 
management system for managing information (document 
information) recorded on a medium such as paper as an 
image file, a document information management method 
and a document search method. 

In general, in a filing system, information recorded on a 
paper medium such as a slip is read by means of a scanner, 
etc. under control of an exclusive-use terminal apparatus 
(client) for the filing system, and a read image (file) is 
associated with a search key and registered in a filing server. 
A search key associated with a target image file is input 
through the exclusive-use terminal apparatus, and thereby 
the target image file can be specified from a number of image 
files stored in the fiJing server. 

The search keys arc, in most cases, characteristic words 
recorded on paper media, such as slip numbers, titles, or 
dates. The conventional filing system requires manual input 
of such search keys through keyboard operations. 

Accordingly, in a case where a great number of paper 
media are filed in the form of image files, the operator is 
required to input search keys through a keyboard operation 
each time paper media arc read by the scanner one by one 
or in units of a scries. This is very time-consuming. 

There is known an OCR (Optical Character Reader) for 
optically recognizing characters recorded on paper. This 
apparatus merely outputs characters such as slip numbers or 
titles recorded on slips in the form of text files, and docs not 
have a function of systematically storing the text files in 
association with image files. In the prior art, in order to 
systematically store the text files obtained by the OCR 
apparatus in association with image files obtained by the 
scanner, an operation is required to input information rep- 
resenting the relationship between the text files and image 
files. 

Either in a case of using the filing system or in a case of 
using the OCR apparatus, it is not possible to file image files 
by simple operations. 

BRIEF SUMMARY OF THE INVENTION 
The object of the present invention is to provide a 
document information management system, a document 
information management method and a recording medium 
for efficiently filing document information recorded on a 
medium such as paper as image files. 

The present invention has the following operational 
advantages. 

A document is read as an image file and stored. At this 
time, a table is updated. On the table, each of words included 
in text information obtained by character-recognizing image 
information output from the read means is associated with a 
page number of a page including each word and a storage 
position of the image information. With this table, not only 
the image information is provided by a full-text search, but 
also the page number of the page including the search 
keyword can be provided. Pie provision of the page number 
has a potential to facilitate search operations. 

BRIEF* DESCRIPTION OK THE SEVERAL 
VIEWS OF THE DRAWING 

FIG. 1 shows the structure of a document information 
management system according to an embodiment of the 
present invention. 
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FIG. 2 is a block diagram of an image input control 
apparatus shown in FIG. 1. 

FIG. 3 is a block diagram of a document information 
accumulation unit shown in FIG. 1. 
5 FIG. 4 is a block diagram of a hierarchical storage shown 
in FIG. 1. 

FIG. 5 is a block diagram of a search unit shown in FIG. 

1. 

Q FIG. 6 illustrates concepts of an image file and a text file 
in the present embodiment. 

FIG. 7 shows an example of title information in this 
embodiment. 

FIG. 8 shows an example of a display screen of a WWW 
is browser shown in FIG. 5. 

FIG. 9 shows ao example of a search request screen in this 
embodiment. 

FIG. 10 shows an example of a bookshelf list displayed on 
the WWW browser shown in FIG. 5. 
20 FIG. 11 shows an example of a box attribute list displayed 
on the WWW browser shown in FIG. 5. 

FIG. 12 shows an example of a document box list 
displayed on the WWW browser shown in FIG. 5. 
25 FIG. 13 shows an example of a screen for inputting search 
conditions in the present embodiment. 

FIG. 14 is a flow chart illustrating an image file search 
process in this embodiment. 

FIG. 15 shows an example of a display screen of a search 
30 result displayed on the WWW browser shown in FIG. 5. 

FIG. 16 shows another example of the display screen of 
the search result displayed on the WWW browser shown in 
FIG. 5. 

FIG. 17 shows a display example of an objected image 
* 5 file. 

FIG. 18 shows an example of information registered on a 
synonym table. 

FIG. 19 is a flow chart illustrating a procedure for 
40 synonym search. 

FIGS. 20A, 20B, 20C and 20D arc views for describing 
a file access operation in the hierarchical storage. 

FIG. 21 shows a data structure on a document table. 

FIG. 22 shows a data structure on a text table. 
45 FIG. 23 shows a data structure on an image table. 

FIG. 24 is a flow chart illustrating processing of a search 
request from a personal computer (WWW client). 

FIG. 25 is a flow chart illustrating an operation in the 
document information accumulation unit (document server). 

50 

FIG. 26 shows an example of a display screen of a search 
result. 

FIG, 27 shows another example of the display screen of 
the search result. 

55 DETAILED DESCRIPTION OF THE 

INVENTION 

An embodiment of the present invention will now be 
described with reference to the accompanying drawings. 

oo FIG- 1 shows the structure of a document information 
management system according to the embodiment. As is 
shown in FIG. 1, the document information management 
system of this embodiment is constituted such that an 
input/output unit 10, a document information accumulation 

ft <j unit 12 and a search unit 14 arc connected by a network 16. 
Hie input/output unit 10 functions to form image files by 
reading information recorded on paper media such as docu- 
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mcnLsorslipsby means of a color scanner 22, and subjecling file processing unit 20c, i.e. a recognition result file (text file) 

the read image files to a character recognition process, 22 and an image file 22h t as well as temporary data for the 

thereby preparing text files of all the obtained characters and processing in the image file processing unit 20c and for the 

accumulating the text files in the document information filing processing for the document information accumula- 

accumulation unit 12. 5 lion unit 12. 

The document information accumulation unit 12 func- The color scanner 22 reads the image file of information 

lions to prepare title information representing the relation- recorded on a paper medium such as a document or a slip, 

ship between the image files and text files formed by the and outputs the image file to the image file input control 

input/output unit 10, store the image files and text files, and apparatus 20 (image file processing unit 20c). 

search the text files on the basis of a designated word upon 10 The OCR 24 performs a character recognition process for 

a search request from the search unit 14 and output to the characters in the image file read by the scanner 22, which 

search unit 14 the image file corresponding to the text file by characters are located at a predetermined position, thereby 

discriminating the image file from the title information. preparing a text file. l'he OCR 24 outputs each file to the 

The search unit 14 functions to output to the document image file input control apparatus 20 (OCR control unit 

information accumulation unit 12 an image file search 15 20ff). 

request based on word designation and acquire from the The digital PRC 26 has a function of reading the infor- 

documcnl information accumulation unit 12 the image file malion recorded on the paper medium such as a document 

corresponding to the search request. or a slip as an image file, a function of printing the image file 

llie structures of the input/output unit 10, document „ w ^ich the search unit 14 acquired from the document 

information accumulation unit 12 and search unit 14 will 20 informal™ accumulation unit 12, and a function of a 

now be described in detail. C0 Py ,n e machine - 

-» ■ ui i a- u *u a ** '\-a ^t«.^..r* In this embodiment, the input/output unit 10 acquires data 

FIG. 2 is a block diagram showing the detailed structure • . r r , . /■ 

, , . ■ «#* * ■ l ri/- -> a from the document information accumulation unit 12 as text 

of the input/output unit 10. As is shown in FIG. 2, the _, nn .uj.ci u 

y . . -. in- .i • i a- . - * files or image files. However, other data files such as voice 

input/output unit 10 in this embodiment comprises an image 2 s - . . ,. * j 

rt • • , - ft . n nru or moving pictures may be obtained, 

file input control apparatus 20, a color scanner 22, an OCR * r ' . c 

24 and a digital PPC 26. l'he image file input control FIG. 3 is a block diagram showing a detailed structure of 

apparatus 20 is realized by a computer whose operation is ^ docum r f" t information accumulation unit 12. As is 

controlled by a program read out from a recording medium ^ in ™" \ lh f. documcnl information accumulation 

such as a magnetic disk. 30 mi } 12 J n this embodiment compnses a document server 30 

,„ . ° . . r f and a hierarchical storage 32. 

IHc .mage nicmpu.contro apparatus 20 has funcuons of M fa cons[imlcd b , cr 

prepar.ng .mage files of .nfomuboo recorded I on paper ^ medium ^ 

media such as documents or slips and files of recognition a- u a ^ * 

. /4 . . , . . r , , . . _ fi ? . a magnetic disk and has its operations controlled by the \o 

rcsu is (texts) on characters included in the image files, by -ru a . ™ . , ' fm , 

.u 1 v> nro *>a A n i*»\ oon « program. ITic document server 30 performs a filing control , 

usi ng the color scanner 22, OCR 24 and digital PPC 26, and 35 1 b . a . ir r . . . ?. . „„, ; 

ri . a c 1 ' il 1 , ■ r -™ of various files between itself and the image file input \ 

filing the prepared files m the document information accu- ^ ^ ^ ^ ^ Qf ^ 

mulaUon um! 12 w itself and the search unit 14. 

•1TK image file input control apparatus 20 comprises, as ^ ^ shown - n F , G 3 {hc documcm 30 compris csQ 

shown in FIG. 2, an OCR control unit 20n. a control u ml a ww 3Qa a CG , pro firam 30ft a database Mlc . a (• 

20/,, an image hie processing unit 20c and a storage unit 2M. hafd ^ ^ a nij^ „nit W, md ■ lynnaym table To g . [ 

The OCR control unit 20a controls the processing in the Thc ww servef 30fl performs tJala transmission* 

OCR 24, acquires image files and text files which arc the belween iLself and lnc unit 14 (WWW client) and has 

processing results in the OCR 24, and stores thc acquired an exlerna i pro& ram executed by using a CGI (common 

files in the storage unit 20rf. ^ g alcwav interface), thus enabling a file search process, etc. 

Thc control unit 20ft controls thc entirety of thc image file t0 fc>c executed. Thc WWW server 30fl supplies an image file 

input control apparatus 20 and performs processing for acquired and stored in the hard disk ZOd to an origin of a 

systematically registering in thc document information search request. 

accumulation unit 12 the files to be tiled, which arc stored -j^ CG | program 306 is a program executed via thc CGI. 

in the storage unit 20rf. 50 /\ ccor ding to this program, a search process associated with 

The image file processing unit 20c acquires image files the database 30c and a process associated with files stored in 

obtained by thc color scanner 22 or digital PPC 26 and stores the hierarchical storage 32 (an image file acquisition 

them in the storage unit 20rf, and also subjecting the process) arc executed. Thc CGI is an interface through 

acquired image files to a character recognition process and which the WWW server 30a executes the external program, 

stores the recognition results (texts) in the storage unit 20d. $$ \ n tnc search process, on thc basis of a word designated by 

l'he image file processing unit 22 is provided with functions the search request from thc search unit 14, thc CGI program 

of an area discrimination unit 22e and a character recogni- 30ft searches all characters of text files stored in thc database 

tion unit 22/ 30c, and discriminates an associated text file. In addition, in 

'Hie area discrimination unit 22e discriminates thc area on the image file acquisition process, the CGI program 30ft, 

which characters are recorded, on the basis of the acquired o0 upon receiving a select instruction from thc search unit 14 

image file. Hie character recognition unit 22/ recognizes the for a search candidate or thc text tile discriminated by the 

characters included in Ihe area discriminated by the area search process, acquires from the hierarchical storage 32 thc 

discrimination unit 22*'. A recognition result of the character image file corresponding to thc selected text file on the basis 

recognition unit 22/ is stored in thc storage unit 2(W in of title information (to be described later in detail) and stores 

association with the image file to he processed. to5 thc acquired image file in thc hard disk 30V/. 

Hie storage unit 20t/ stores, for example, the processing Hie d atabase 30c stores tex i h>s from ihr. innui/uuipin 

result of the OCR 24 and the processing result of thc image unit 10 (image Hie input control apparatus 20). On the basis 
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of title information of each of the tex t files stored in the, image file, editing/processing the searched image file, and 

d atabase 30 C, aH associated image file can De rererred to . outputting the image file by displaying or printing. Needless 

The hard disk 30rf storey image files frp^p the inp ui/outpuL to say, the control unit 40a can process data other than the 

unilJO. (image hie input control apparatus IjD^irhageTiles image file. The detailed structures will be described later. 

can be read" out trom We w h' al^3i sk sSSd more quicWy fr° m 5 Th c j npul un jt Mb comprises a keyboard, etc. The input 

the hicrarchicarstorage32: lTie rfard-aisTc'Wis provided uni , ^ ls uscd to inpu| dalat commands, etc. to the 

wTttra* cache area 30e for storing the image file acquired apparalus . Th c disp i ay unil 40c displays an operation screen 

fromthch^rar^ 32 by the image file acquisition fof . fik seafch etc Tfac jntcr unil ^ rfms |e 

pr^ssoTlhe CGI pro-am 30ft. In the cache area 30*. the ^ ^ ^ 

fo owing are designated in advance: the name of dnve for in , ,f ,. . . nnr> *T. . . , . \, . . . t f 

.... b ? unmx, M lft „ 10 be the digital PPC 26 which is used in the input/output unit 

permitting access by the WWW server 30a, the directory & r 

indicating logical memory positions, the size of area, and the 10 for rcadin & iraa & e filcs ' 

ratio of upper and lower limits of the use area lo the whole The control unit 40a comprises the functions of a WWW 

area, which is used for management of stored files (deletion browser 40e, an edit process section 40/, a display control 

of unnecessary files). section 40#, a print control section 40/f and a communication 

Thc filing unit 30/ stores the text file to be filed from the 15 section 40/. 

image file input control apparatus 20 into the database 30c, The WWW browser 40e performs communications wittf j 

and also stores the image lilc into the hard disk 30a* or the document server 30 (WWW server 30a) of document y 

hierarchical storage 32. At thc time of storing the image file information accumulation unit 12 through simple operations / 

and the text file corresponding to the image file into the us j n g a CJUl (graphical user interface), and executes a [ 

associated recording media, the filing unit 30/ prepares title 20 proccss sucn ^ j ma g e fii e search. 

information indicating the correspondency between the ^ c6{{ pfOCCSS 4o /cdits or pr0 cesscs the data, 

image file and text file, thus permitting reference by the CGI such as i|Mgp ^ which is acqujrcd fmm lhc d()cumen i 

program 30/?. information accumulation unit 12. 

- 1 116 synonym 30g is referred to by the CGI program 306 ^ & { con{m{ 40 d - , Qn t , K 

to find synonyms of a word designated by a search request J { ^ ^ informatiori| clc . providcd by 

from the search unit 14. A plurality of words and their thc V ww browser 40e. Ine print control section 40/, 

synonyms are associated and registered in thc synonym table ^ {n[ . of . ^ ^ < (he jmcr unj| 

30#. In the search process by the CGI program 306, a search ^ 

is performed on the basis of the word designated by thc ' 

r . . i t , ^ ' nf . ■„ 30 lTie communication section 40/ controls communications 

search request and, in addition, the synonym of thc desig- . . . . - r 

. . , ■ i ■ . - , , r „ , la u n in ,L with thc document server 30 ot document information 

naled word is discriminated from thc synonym table 30 and . . . ■ 

. . , m • . ,i * ' „A m „,j accumu alion unit 12 or thc pnntcr unit such as digital FPC 

a search based on Ihe synonym is also perlormed. . . 

^ , . . - , 26 over thc network 16. 

The hierarchical storage 32 is a large -capacity recording m ..... 

medium and is uscd for accumulating image files in the Hie oration of the document information management 

present embodiment. The hierarchical storage 32 can, of 35 s V stcr " according to the present embodiment will now be 

course, store text files and data files of voice, moving ^scribed. At first, an operation of filing information (text 

pictures, etc.. too. The storage 32 may comprise DVDs fiIc ' ""W^ J" * hc documcn « information accumulation 

(digital video disk) and MOs (magneto-optical disk) as uml 12 Wl11 be Ascribed. 

recording media. In this embodiment, MOs are used. When an image file has been read from paper medium 

PIG. 4 is a block diagram showing a detailed structure of 40 such as a document by thc color scanner 22 or digital I'I'C 

thc hierarchical storage 32. The hierarchical storage 32, as 26 > a lcxl * ilc and an image file arc obtained by the 

shown in FIG. 4, comprises an MO drive 32a, an processing of the image file processing unit 20c in thc image 

autochanger (AC) 32b and a controller 32c. fiIc in P ul controI apparatus 20. 

Predetermined ones of MOs (magneto-optical disks) in Specifically, thc area discrimination unit 22e of image file 

thc autochanger 32/) are selected and mounted in thc MO processing unit 20c discriminates a text area of thc read 

drive 32a, whereby read/write of image files is executed. »™ge f,lc ' on whicb characters are recorded. The character 

Thc MO drive 32a in this embodiment comprises four drives recognition unit 22/ performs character recognition on all 

I to 4 which are activated in accordance with the designation characters on the text area discriminated by the area dis- 

of drive (e.g. E, F, G, or II) by the document server 30 (CGI $[) crimination unit 22e. 

program 30b). The image file processed by thc image file processing unit 

In thc autochanger 32b t a plurality of MOs (150 MOs in 20c and thc character recognition result (text) by the char- 

this embodiment) arc mounted in slots (1 to 150). Predcler- acter recognition unit 22/ arc stored in thc storage unit 20a*. 

mined MOs arc mounted in thc drives of the MO drive 32a l ? K5. 6 is a conceptual view illustrating a proccss of acquir- 

on an as-nccded basis. ss ing thc image file and text. 

Thc controller 32c enables the document server 30 (CGI On the other hand, when a paper medium such as a slip 

program 30/>) to access (i.e. acquire image files from) thc is processed by the OCR 24, a text file and an image file are 

specific MO indicated by thc title information from the acquired under control of the OCR control unit 20a of image 

document server 30, on thc basis of thc drive name deter- file input control apparatus 20. 

mined when thc MO is mounted in thc MO drive 32a. 60 Thc OCR 24 reads an image file of the slip to be 
HO. 5 is a block diagram showing detailed functions and processed. Characters arc read from the image file to pro- 
structures of the search unit 14 which comprises a personal duce a text file. The OCR control unit 20a acquires the 
computer 40. As is shown in FIG. 5, the personal computer image file and text file, which are the processing results of 
40 comprises the functions of a control unit 40a, an input the OCR 24, and stores them in the storage unit 20a*. 
unil 40/?, a display unit 40V and a printer unit 40V/. t>5 l"he image file input control apparatus 20 can register files 
Ihe control unil 40a controls the entirely of the apparatus. for the filing in the document information accumulation unit 
The control unit 40a has functions of requesting search of an 12 so that these files may be hierarchically managed. In thc 
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present embodiment, a minimum unit of a file is set to be one is set and connection is designated, the document server 30 

page (corresponding to one paper sheet). A plurality of pages provides a list of bookshelves belonging to the designated 

is treated as one document. A plurality of documents is data source. FIG. 10 shows an example of the bookshelf list 

managed as a document box, and a plurality of document displayed by the WWW browser 40e. If a desired bookshelf 

boxes is managed as a bookshelf. Furthermore, a plurality of 5 is selected and a button of "document box list" is depressed 

bookshelves is managed as a data source. in the state shown in FIG. 10, the display of the bookshelf 

The document information accumulation unit 12 performs is changed to the display of the document box list as 

a filing operation for the files (image files, text files) from the shown in FIG. 12. Subsequently, if a desired document box 

image file input control apparatus 20 in the following is selected and a button of "search condition" is depressed in 

manner (sec FIGS. 3 and 6). The filing unit 30/of document »o the state shown in FIG. 12. a search condition input screen 

server 30 stores the image files in the hard disk 30</ or as shown in FIG. 13 is displayed. 

hierarchical storage 32. and stores the text files in the • In a case where the execution of the display of the box 

database 30c as incidental information to the image files. attribute list has been designated, the document server 30 

In addition, the filing unit 30/ prepares title information provides a list of box attributes belonging to the designated 

indicating the correspondency between the image files and 15 data source. FIG. 11 shows an example of the box attribute 

text files. FIG. 7 shows an example of title information. In Iisl displayed hy Ihe WWW browser 40*. If a desired box 

the title information, as shown in FIG. 7, for example, a text atlribule » ****** and a button of "search condition" is 

file name of a text file registered in the database 30c is depressed in the state shown in FIG. 11, the search condition 

associated with information indicating a medium storing a input screen as shown in FIG. 13 is displayed, similarly with 

corresponding image file and a path (directory) indicating a 20 » he above-described case. 

logical storage position in the medium storing the image file. FIG. 13 shows the conditions for search and, for example, 

Die information indicating the medium is a drive name in character type keywords numeral type keywords and dif- 

the case where the image file is stored in the hard disk 30rf. ferent ™ me lv P e keywords are designated as the conditions. 

However, in the case where the image file is stored in the „ In addllIon / lhc date of formation and the date of updating 

MO in (he autochangcr of hierarchical storage 32. that 25 can be designated as cond it ioiis. In the case of the character 

information indicates the MO itself, and not the drive name lv P c keywords, freely chosen words of a natural language 

of the MO drive 32a. For example, when the image file is are designated. 

stored on the MO medium (surface) of the slot 1 of Referring to FIG. 3 and the flow chart of FIG. 14, a 

autochangcr 32/), information "AC1 M indicating the slot 1 description will now be given of an image file search process 

and the surface of MO medium is set. for searching image files accumulated in the document 

Each file is accompanied with the date of formation, the information accumulation unit 12 on the basis, of a search 

dale of updating and other attributive information, and it is c ? nd ition designated on the search cond.t.on input screen 

thus filed sh0WQ m ^ 13 

Next, the operation of searching the image file accumu- 35 h If lhc ^ rcb S ^'' j^^T ^ RG " ^ L V* lCClCd ' 

....... 1 . • r . i ? , n the personal computer 40 (WWW client) requests the execu- 

lalcd in the document information accumulation unit 12 will K . r V m/m™ in , u 

. . ii/ pip -i\ ll0n of the image file search to the WWW server 30a on the 

now be described (see wii. •>). basjs of ^ condition scl through the WWW browser 

In the personal computer 40 in the search unit 14, the 

WWW client program is executed and the search request for ^ ww ^ cr 30 rcccivi |hc mnh exccu . 

the image hie, etc. arc input through the WWW browser 40e * ^ ^ (hc ^ pro&ram ^ ^ |hc 

FIG. 8 shows an example of the display screen of the se arch of te xt files stored in the databas e 30c and corrc- 

WWW browser 40e. As is shown in FIG. 8, the display sponding lo lhc flOcWnt b"6x lo be processed, according lo 

screen of WWW browser 40c is provided with a tool-bar on lnc designated search condition, i.e. the word set as a 

which icons indicating commands arc arranged, an area keyword (step B2). I'he CGI program 30/; executes a full 

"Location" for inputting a URL (uniform resource locator) lcxt ^^j, for a n c h ara cters included in each of the text files 

indicating a desired location of information, and an area (list l0 be proC csscd and discriminates the text including the 

display portion) for displaying information. wor(1 designated by the search condition. 

At first, the personal computer 40 (WWW client) trans- The CGI program 306 acquires, as a search result, the 

mils a URL indicating the CGI program 306 to the WWW 5Q associated text file from the database 30c. The title infor- 

servcr 30a of document server 30 via the WWW browser ma iion of the associated text file is also acquired. The CGI 

40e. I'he WWW server 30a of document server 30 receives prog ram 306 subjects the search result to a structuring/ 

the URL and activates the CGI program 306. documentation process, e.g. an HTML (hyper text markup 

Thereby, the CGI program 306 provides a file search language) process (step B3). 

screen to the WWW browser 40t? via the WWW server 30a. 55 T he cSTT^firanT30 starts to acq uire an imagcJUc^ 

In addition, the CGI program 306 acquires a data source accumulated jn The hierarchical storage" fif, 'which* corrc^ 

existing in the document information accumulation unit 12. spends to the text file obtained by searching the database 

As is shown in MG. 9, the browser 40<? displays a screen 30c, and looopy lhc imaffiTifc into {fie cache af ca, ftQg'fa lhc _ 

for designating the data source for search, user ID, jiard disk 304 (step. B4). 

password, and method of listing (bookshelf list or box o0 Specifically, the text file obtained by the search of the 

attribute list). In the input area for the data source to be database 30c is a candidate of the text file corresponding to 

searched, a list of data sources may be presented upon the finally selected image file and is selected by the personal 

request, and the name of a selected data source may be set. computer 40 (WWW client). Before Ihe selection is 

In Ihe example of FIG. 9, lhc dala source name "ALBA designated, the CGI program 306 starts the process for 

DOC" is sel. t,s storing Ihe image file candidate of the search into the cache 

For example, in a case where execution of display of the area 30e of hard disk 30c/, from which the file can be read 

bookshelf list has been dcsignaicd, if necessary information in at a higher speed than from Ihe hierarchical storage 32. 
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However, if the necessary image file is already present in The CGI program 30b searches the text files registered in 

the hard disk 30d, there is no need to copy the file accumu- the database 30c according to the procedure illustrated in the 

lated in the hierarchical storage 32 into the hard disk 30d. flow chart of FIG. 19. 

The details of the process for acquiring the image file Specifically, the CGI program 30b conducts the search 

from the MO in the autochangcr 32b of hierarchical storage 5 based on the designated key (word), similarly with the above 

32 will be described later case (step Al), thus determining whether the designated 

The WWW server 30a transfers the search result (HTML word is registered in the keys of the synonym table 30g (step 

document), which was subjected to the HTML process by A2). 

the CGI program 30b, to the personal computer 40 (WWW If the designated word is registered in the synonym table 

client) (step B5). 10 30g as the key shown in FIG. 18, the text files registered in 

In the personal computer 40 (WWW client), the WWW the database 30c are searched by using the synonyms 

browser 40c analyzes the HTML document and displays a corresponding to the designated word, like the process based 

list of search results, as selection candidates, which meet the on the designated word (step A3). 

designated search condition (step B6). If the associated text files have been searched on the basis 

FIG. 15 shows an example of a search condition list of the synonyms corresponding to the word designated by 

displayed on the screen of the WWW browser 40e. In the the CGI program 306, the WWW server 30a transmits these 

example of FIG. 15, title information itself obtained from files to the personal computer 40 (WWW client) as search 

the database 30c of document server 30 is not displayed, but results. 

information indicating the presence of the text file meeting 20 Accordingly, even if the word designated as search con- 

thc search condition is displayed. dition is too ambiguous to search the image file, the search 

On the display screen of the search result list shown in based on its synonyms can be performed to acquire associ- 

F1G. 15, the search result corresponding to the desired image atcd information search candidates, 

file is selected and then a "page list" button or a "top page The process for acquiring the image file from the MO of 

list" button is selected. Thus, the transfer of the image file 2 s autochanger 32b of hierarchical storage 32 in step B4 in the 

can be requested. flow chart of FIG. 14 will now be described in detail with 

For example, when the "top page list" button has been reference to FIGS. 20A, 20B, 20C and 20D. 

selected to request the transfer of the image file, the personal in general, at the lime of file access, the drive for 

computer 40 (WWW client) transmits to the WWW server activating the recording medium storing desired files is 

30a the designated search result and the request for trans- 30 designated along with the file path (directory). In the process 

ferring the image files for "top page list" (step B7). of acquiring the image file by the CGI program 30b of the 

The WWW server 30a, in response to the image file present embodiment, the medium storing the desired file and 

transfer request, file -accesses the hard disk 30d (or cache the image file are designated, as shown in FIG. 20A. 

area 30e) and acquires the image file corresponding to the On the basis of the title information (see FIG. 7) obtained 

designated search result (step B8). 35 by searching the text files registered in the database 30c, the 

«P**t" 'I h e WWW server 30a tr ans fers the image file acquired CGI program 30b designates the medium storing the desired 

^ - n X from the hard disk 3IH to tFc personal computer 40 (wWW file and the path to the image file, as shown in FIG. 20A. 

r **^ N A/ J c lient) (step B^ - tne fi' c access is designated for the media MO stored in 

3" 1 \* wV In this case, for example, as shown in FIG. 16, the WWW the autochanger 326, the hierarchical storage 32 discrimi- 

fQf 6 i>*- w browser 40c displays a top page list of documents (the 40 nates the media MO of the associated slot, on the basis of a 

^y;<* •* \p number of documents being two in FIG. 1 6) included in the conversion table shown in FIG. 20 B, and mounts this media 

^ (p ^ 3 * document box to be processed. MO in the available drive in the MO drive 32a. 

Subsequently, on the screen shown in FIG. 16, the top In the example shown in FIG. 20C, a drive 2 with drive 

page of any of the documents is selected and the execution 45 name F of four drives 1 to 4 with drive names E, F, G and 

of display of "page list" is instructed. Then, as shown in FIG. H is available. Thus, the media MO (the upper surface 

17, the personal computer 40 (WWW client) displays thereof being used) of designated slot 1 is mounted in this 

through the WWW browser 40c the list of all pages (three drive. 

pages in FIG. 17) included in the associated document. The controller 32c enables the document server 30 to 

Ihus, on the basis of the word designated as search 50 make file access to the media MO mounted in accordance 

condition, a search is conducted for all characters of text with the use condition of drives 1 to 4 of MO drive 32a. 

files registered in the database 30c as incidental information Specifically, as shown in FIG. 20D, the designation of media 

al the time of registration of image files. The text file based on the title information is changed to drive name (e.g. 

candidates (search results) meeting the search condition are "F:"). Thus, the image file can be acquired from the media 

found, and the image file accumulated in the document 55 MO in slot 1 of autochanger 32b by the designation of the 

information accumulation unit 12, which corresponds to the general drive name and image file path, 

selected search result, can be obtained. Since the file access to the media MO of autochanger 32b 

In the above description, the CGI program 30b searches can be performed, as illustrated in FIGS. 20Ato 20D, there 

the text files registered in the database 30c, using only the is no need to assign the drive names (e.g. A to Z), the number 

designated word (keyword) as search condition. However, fi o of which is limited, to the media MOs. Thus, the number of 

the search may be conducted by using the synonym table media MOs provided in the autochangcr 32/; is not limited, 

30#. and the hierarchical storage 32 having a large memory 

FIG. 18 shows an example of the information registered capacity can be obtained, 

in the synonym table 30#. As shown in FIG. 18, the word Another embodiment of the present invention will now be 

designated as the word of search condition and the corre- e>5 described. Since the structure of this embodiment is sub* 

sponding synonyms are registered on the synonym table stanlially the same as that of the preceding embodiment 

3()£. (FIGS. 1 to 6), only different portions will Ik described. 
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Information recorded on the paper media such as docu- meat corresponding to ihe document number indicated by 

mcnts and slits, which is read in by the OCR 24, color the corresponding joint number is associated with the file 

scanner 22 and digital PPC 26, is generally called "docu- name. 

mcnt" hereinafter. Suppose that the document consists of Such information is obtained, from the image tabic 108 

one or more pages. For example, one document consists of 5 shown in FIG. 23 corresponding to joint number 1, that the 

plural pages, while another one page. Accordingly, the document with document number 1 comprises four pages 

number of text files and image files, which correspond to one anc j cacn pa g C fij c ^ s t ore d in the storage medium of media 

document transferred from the image file input control number MOl in hierarchical storage 32 with the correspond- 

apparatus 20 to the filing unit 30/, corresponds to the number • qj c Qarac 

of pages of the document. c , 10 The operation of searching the image file accumulated in 

pc filing unit 30/slorcs in the database 30c the text files {hc documcm informalion accurau i a tk>n unit 12 according to 

to be filed winch have been transferred from the image file cmbodimcnl wi ,i now ^ dcscribcd wilh rcf^cc lo 

input control apparatus 20 ^^^J^^ flow charts. In this operation, an image file of a document 

he hard disk 30V/ or in the MO of hierarchical storage ,52. . . r • .° , . . . 

, JJt ■ . rr •. mr 1 * ,r„,.:„, having a text including a predetermined word is searched 

In addition, the filing unit 30/ prepares data indicating the ? & r 

correspondency between each word of all text files stored in ™ sp aye . t . 

the database 30c and image files stored in the hard disk 30a* "G. 24 is a flow chart il uslr at ing the operation in the 

or the MO in hierarchical storage 32, which correspond to personal computer 40 (WWW client) and FIG. 25 is a flow 

the text including the word. The filing unit 30/ stores the chart illustrating the operation in the document information 

prepared data indicating the correspondency in the database accumulation unit 12 (document server 30). 

30c in the form of three tables (document table, text table 20 In the personal computer 40 of search unit 14, the WWW 

and image table) lo be described below. client program is executed and an image file search request 

The three tables arc shown in FIGS. 21, 22 and 23. The is input through the WWW browser 40e (step CI), 

text table and document table are jointed by joint numbers, The URL indicating the CGI program 306 is transmitted 

and the document table and image table arc jointed by joint to the WWW server 30a of document server 30 from the 

numbers. Instead of jointing the tables by the numbers, the 25 personal computer 40 (WWW client) through the WWW 

three tables may be integrated into one table. browser 40e (step C2). 

FIG. 21 shows a data structure of the document table 100. Thus, the CGI program 306 provides the file search screen 

In the document table 100, a document number (ID number) to the WWW browser 40e through the WWW server 30a. 

101 assigned to each image file of the document read in from The WWW browser 40e displays the screen (graphical user 

the OCR 24, color scanner 22 or digital PPC 26 is associated 30 interface (GUI)) for inputting the necessary search condi- 

with search keys 102 and joint number 103. tions provided by the CGI program 306 (step C3). In this 

lhe search keys 102 comprise two: the title of the screen display step, the input of the search keyword for 

document and the name of the producer of the document. searching the image file is required. Suppose that the opera- 

Thcsc items arc input in advance by key input, etc. at the i5 tor of the personal computer 40 (WWW client) has input a 

lime of registering images in the system. The joint number search keyword "computer" (step C4). 

103 is assigned to each document number 101 in order to If the word for image file search has been designated and 

joint the document table 100 and image table 108. the search execution instruction has been input, the personal 

FIG. 22 shows a data structure of the text table 104. In the computer 40 transmits a search execution instruction based 

text table 104, a word 101 extracted from the text is 40 on the designated search keyword to the document server 30 

associated with a document number 106 of a document (WWW server 30a) (step C5). 

including that word and a page number 107 of a page on If the search execution instruction from the personal 

which the word appears in the document. computer 40 on the basis of the designated search keyword 

The word 105 is, for example, a noun which is extracted has been received (step Dl), the document informalion 

from all text files stored in the database 30c by the filing unit 45 accumulation unit 12 executes the following search opcra- 

30/ and can be used as a key for full-text search. Hie lion. Specifically, the personal computer 40 (WWW client) 

document number 106 is a number of a document from sends through the WWW browser 40c an image file search 

which the word 105 is extracted, and the page number 107 execution request based on the designated search condition 

is a number of a page of the document, on which the word to the WWW server 30a. 

is described. If the text table 104 is referred lo, the document 50 The WWW server 30a activates the CGI program 306 and 

number of the document including the word and the corre- executes the full-text search based on the designated search 

sponding page number in the document can be found from condition, i.e. the search keyword "computer'* with respect 

the word 105. lo the word 105 in the text table 104 shown in FIG. 22 (step 

One document number and one page number arc not D2). 

always stored in association wilh one word. There may be 55 The CGI program 306 refers to the text table 104 stored 

plural documents and pages including the word 105. In this in the database 30c and determines whether the word 

case, plural document numbers 106 and plural page numbers "computer" is registered. As a result, the CGI program 306 

107 arc stored in association with one word 105. searches the word "computer" from the word 105 of text 

FIG. 23 shows a data structure of the image table 108. table 104 and acquires, for example, document number "I" 

One unit of image table 108 is prepared for each document. &o and page number "4" corresponding to lhe word "computer" 

Hie image table 108 is associated with the document num- (step 03). 

ber of document table 100 shown in FIG. 21 by means of lhe Then, the CGI program 306 refers to the document table 

joint number. Since the document is filed in units of a page, 100 stored in the database 30c and acquires the joint number 

the page number is associated wilh a media number 110 for of the document corresponding to the document number "1" 

identifying the storage medium and a file name 111. &5 acquired from the text table 104 (step 04). In the example 

In other words, in each image table, the media number in of FIG. 21, the joint number corresponding to the document 

hierarchical storage 32 storing the image file of the docu- number "1" is "I". 
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Subsequently, the CGI program 30£> acquires the media If the document number of a desired document is clicked 

number 110 and file name 111 corresponding to the page and then the "page list" command button is clicked, all pages 

number "1" acquired from the text table 104, with reference of the desired document are displayed at a time as reduced 

to the image tabic 108 stored in the database 30c which images on one screen, as shown in FIG. 17. If the number 

corresponds to the joint number "1" acquired from the 5 of pages is too large to display all pages on one screen at a 

document table 100 (step D5). Specifically, the CGI program lime, a scroll operation or a screen switch operation is 

30/? acquires the media number "MOP indicating the performed. 

medium in the hierarchical storage 32 storing the image file if (he "lop page list" command button is clicked, a screen 

on the fourth page of the document with document number as shown in FIG. 16 is displayed. Thai is, top pages of all 

"1", and the file name "ooo.tif*. 10 listed documents are displayed as reduced images on one 

On the basis of the media number and file name acquired screen, 

from the image table 108, Ihe CGI program 30fc searches the On the other hand in the case of the search result screen 

hierarchical storage and reads out the associated image file. of FIG. 27, the top pages of all listed documents are 

Specifically, the CGI program 306 reads out the image file displayed as reduced images on one screen at a time. Under 

with file name "ooo.tif from the MO with media number 15 each reduced image, the document name and ihe page 

"MOl" in the hierarchical storage 32, and transfers the number of the page including the search key arc arranged, 

image file to the WWW server 30a. The WWW server 30a Unlike the screen of FIG. 16, the screen of FIG. 27 shows 

then transfers the image file to ihe WWW browser 40e of four command buttons: "next page", "previous page", "page 

personal computer 40 (WWW client) (step D6). select" and "select." ITie functions of these command but- 

Upon receiving the search result (image file) from the 20 tons arc as described above. 

WWW server 30a (step C6), the personal computer 40 The search result screen may display ihe pages including 

(WWW client) instructs the activated WWW browser 40e to (he search keys in all listed documents as a reduced image, 

display ihe image file of the fourth page of the document as shown in FIG. 27. 

with document number "1" which has the text image file described above, image files of all pages in the 

including the word "computer". 25 document corresponding to the document number can be 

In the above description, only the image file correspond- read out at a time. In addition, if the page number of the 

ing to the page number acquired from the text table 104 is associated page is added, the image file of this page can be 

searched, Thus, only the image file of the fourth page of the preferentially displayed. Moreover, if an image file of 

document with document number "1" is transmitted from ^ another page is needed (it is highly possible to refer to a 

the hierarchical storage 32 lo ihe personal computer 40. Il is preceding or following image file in the document), there is 

possible, however, that the image files of all pages of the no need to newly receive an image file from the document 

document with document number "1" acquired from the text server 30 and thus the image file can be displayed quickly, 

table 104 are read out and transferred to the WWW browser n o1 only the image file read out from the hierarchical 

40c of personal computer 40 (WWW client) along with the ^ slora g C 32, but also the search keys 102 (name of document, 

page number "4" of the associated page. name 0 f document producer) acquired from the document 

In this case, on the basis of the received page number "1", table 100, the page number acquired from the text table 104 

the WWW browser 40e first displays the image file of the and the file name acquired from the image table 108 may be 

fourth page and then selectively displays the image file of transferred to the WWW browser 40e at the same time, 

another page in accordance with the instruction from the 40 Thereby, the WWW browser 40t? can display not only the 

operator. image but also the aforementioned various information 

In a case where a document corresponding lo one docu- acquired from each table. Thus, the display mode of the 

mcnt number is not treated but a plurality of document search results becomes more effective, 

numbers and a plurality of page numbers are acquired from For example, a list of titles may be displayed in a mode 

the text table 104 on the basis of the search keyword, each 45 similar to the display mode of search results shown in FIG. 

page number and the title and name of the producer of the 15, and thus the image file of a designated page in the 

document, which arc search keys corresponding to each document selected from the list of titles can be displayed, 

document number read out from the document table 100, ITic search condition is not limited to the search keyword, 

may be transferred to the personal computer 40 and image 'I'hc image file search can also be executed by designating a 

files of all pages of the document corresponding to each 50 synonym of the keyword. In this case, synonyms corrc- 

documcnt number may be read out and transferred to the sponding to a given word arc registered as words 105 on the 

personal computer 40. lext lable 104. The synonyms of the word input as search 

Hie WWW browser 40e displays, as search results, the condition are discriminated by using the synonym table 

data transferred from the hierarchical storage 32. shown in FIG. 18. Each synonym is treated like the input 

FIG. 26 shows a display screen of search results. As a 55 word, and the image file is searched, 

result of search, a list of the titles of a plurality of acquired In the above description, Ihe search execution is 

documents, the names of producers and the associated page instructed only by designating the word. However, since the 

numlrcrs is displayed. Below the list, six command buttons search keys 102 (name of document, name of document 

are displayed: "page list", "lop page list", "select", "ncxl producer) corresponding lo the document number 101 are 

page", "previous page" and "page select". 60 registered on the document table 100, the image file search 

If the document number of a desired document is clicked based on the search key can be executed as a matter of 

and then the "select" command button is clicked, a lop page course. Specifically, when the search key is designated as 

of the desired document is displayed on the entire screen. On search object, the document table 100 is searched and the 

this screen, too, the three command buttons, "ncxl page", corresponding joint number is acquired. Based on the joini 

"previous page" and "page select", are displayed. By freely 65 number, the image file is acquired similarly with the above 

operating these buttons, the operator can freely turn the case. In addition, since the search object is restricted by 

pages or find a desired page by short-cut. executing the designation of the word and the designation of 
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(he search key in combination, the desired image file can be 
acquired efficiently. 

In this manner, the search is executed by providing the 
document (able 100, text table 104 and image table 108, and 
a given word is designated. Thus, the image file of the 5 
associated page of the document including this word can be 
direclly search and displayed. Therefore, a necessary file can 
be efficiently acquired from a great deal of image files stored 
in the hierarchical storage 32. 

In the structure shown in FIG. 1, the search unit 14 10 
communicates with the WWW server 30a of document 
server 30 by means of the personal computer 40 through the 
WWW browser 40e. It is possible, however, to use an 
exclusive-use terminal, loo, for accessing the information 
accumulated in the document information accumulation unit 15 
12. 

When the exclusive-use terminal is used, the document 
server 30 is provided with functions for the exclusive-use 
terminal for communication with the exclusive-use terminal 
and an access control to the database 30c and hierarchical 
storage 32 without using the CGI program 30b. 

The method described in the above embodiments may be 
applied to various apparatuses by writing programs to be 
executed by the computer in recording media such as 2J 
magnetic disks ((loppy disk, hard disk, etc.), optical disks 
(CD-ROM, DVD, etc.) or semiconductor memories, or by 
transmitting the programs through communication media. 
The computer constituting the present apparatus executes 
the above processing by reading in (he program recorded on 30 
a recording medium and controlling the operations accord- 
ing to this program. 

We claim: 

1. A document management system comprising: 

image read means for reading image information of a 35 
document and outputting the image information; 

storage means for storing the image information output 
from said image read means; 

means for character-recognizing character information 
included in the image information output from said 40 
image read means, and converting the character infor- 
mation to text data; 

means for preparing a table for managing said document, 
on which a plurality of words included in the text data, 
an identification code for the document, and a page 45 
number of a page in the document on which the words 
are present, arc associated; 

means for inputting a search keyword; 

search means for specifying the document in which the 
search keyword is present and the page number, with 
reference to the table on the basis of the search key- 
word; and 

display means for displaying, in an associated manner, an 
image of a lop page of the document specified by said J5 
search means and the page number. 

2. A document management method comprising the steps 

of: 

reading image information of a document with use of 
image read means; 60 

storing the image information read by the image read 
means in storage means; 

character-recognizing character information included in 
the image information read by the image read meaas, 
and converting the character information to text data; $5 

preparing a table for managing said document, on which 
table a plurality of words included in the text data, an 



16 

identification code for the document, and a page num- 
ber of a page in said document on which said words are 
present, arc associated; 
inputting a search keyword; 

specifying the document in which said search keyword is 
present and the page number, with reference to the table 
on the basis of the search keyword; and 

displaying, in an associated manner, an image of a lop 
page of the document specified by the search means 
and the page number. 

3. A memory storing a computer-executable program 
code, the program code comprising: 

means for causing a computer to read information of a 
document with use of image read means; 

means for causing a computer to store the image infor- 
mation read by the image read means in storage means; 

means for causing a computer to character- recognize 
character information included in the image informa- 
tion read by the image read means, and converting the 
character information to text data; 

meaas for causing a computer to prepare a table for 
managing said document, on which table a plurality of 
words included in the text data, an identification code 
for (he document, and a page number of a page in said 
document on which said words arc present, arc asso- 
ciated; 

means for causing a computer to input a search keyword; 
means for causing a computer to specify the document in 

which said search keyword is present and the page 

number, with reference to the table on the basis of the 

search keyword; and 
means for causing a computer to display, in ah associated 

manner, an image of the top page of the specified 

document and the page number. 

4. A document management system comprising: 
image read means for reading image information of a 

document and outputting the image information; 

storage means for storing the image information output 
from the image read means; 

means for character-recognizing character information 
included in the image information output from the 
image read means, and converting the character infor- 
mation to text data; 

means for preparing a table for managing said document, 
on which table a plurality of words included in the text 
data, an identification code for the document, and a 
page number of a page in said document on which said 
words are present, arc associated; 

means for inputting a search keyword; 

search means for specifying the document in which said 
search keyword is present and the page number, with 
reference to the table on the basis of the search key- 
word; and 

means for displaying, in a listed manner, an image of a 
page specified by said page number of the document 
specified by ihe search means. 

5. A document management method comprising the steps 

of: 

reading image information of a document with use of 

image read means; 
storing the image information read by (he image read 

means in storage means; 
character-recognizing character information included in 

the image information read by the image read meaas, 

and converting the character information to Ihe text 

data; 
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preparing a table for managing said document, on which 
table a plurality of words included in the text data, an 
identification code form the document, and a page 
number of a page in said document on which said 
words arc present, arc associated; 

inputting a search keyword; 

specifying the document in which said search keyword is 
present and the page number, with reference to the table 
on the basis of the search keyword; and 

displaying, in a listed manner, an image of a page speci- 
fied by said page number of the specified document. 

6. A memory storing a computer-executable program 
code, the program code comprising: 

means for causing a computer to read image information 
of a document with use of image read means; 

means for causing a computer to store the image infor- 
mation read by the image read means in storage means; 

means for causing a computer to character- recognize 
character information included in the image informa- 
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lion read by the image read means, and converting the 
character information to text data; 
means for causing a computer to prepare a table for 
managing said document, on which table a plurality of 
words included in the text data, an identification code 
for the document, and a page number of a page in said 
document on which said words are present, are asso- 
ciated; 

means for causing a computer to input a search keyword; 
means for causing a computer to specify the document in 

which said search keyword is present and the page 

number, with reference to the table on the basis of the 

search keyword; and 
means for causing a computer to display, in a listed 

manner, an image of a page specified by said page 

number of the specified document. 
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